Skip to content

Conversation

@rowanc1
Copy link
Member

@rowanc1 rowanc1 commented Aug 12, 2025

This improves the markdown formatting output.

@changeset-bot
Copy link

changeset-bot bot commented Aug 12, 2025

🦋 Changeset detected

Latest commit: dd786b9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 6 packages
Name Type
myst-parser Patch
myst-to-md Patch
myst-roles Patch
myst-directives Patch
myst-to-html Patch
myst-transforms Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@rowanc1 rowanc1 added the enhancement New feature or request label Aug 12, 2025
Copy link
Member Author

@rowanc1 rowanc1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting changes below. We need to get the parsing data onto the mystDirective/Role nodes, so that we have an easier time in formatting. Would love some thoughts on direction here.

directives: [cardDirective],
});
expect(output).toEqual(expected);
expect(output).toMatchObject(expected);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests in the extension code are made a bit less strict here. No changes to the code, so no need to bump the packages.

if (argSpec.required && data.arg == null) {
validationError = true;
}
node.args = data.arg;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the biggest change is that we put the arg, options and body of roles and directives onto the mystDirective node. This allows us to process them afterward in a formatting package - or transform a figure into a directive etc. rather than doing that in the markdown writer.

This shouldn't matter for the duplication as we remove these nodes early in the processing for most things. But it is really helpful to have them for the formatting / writing of markdown.

We might want to change it to node.argsParsed or something so we have the pre/post parsing? Or put it on the data field.

@fwkoch or @agoose77 if you have opinions here, that would help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - seeing this, my first impression was "we already stash this stuff on the mystDirective nodes, don't we...?" - That's not entirely true.

On the initial parse pass, we do save args and options to the mystDirective node; these are the raw, unparsed values. I don't think these fields are used anywhere. Additionally, we create mystDirectiveArg/Option/Body nodes, where the raw value is saved to be parsed, in this applyDirectives function. Within this function we are building a data object of type DirectiveData which contains all the parsed arg/option/body content. Critically, this data object is transient; it's just passed directly to the directive plugins.

So, the change in this PR is taking the parsed, previously transient arg/option/body values and saving them directly to the mystDirective node. I agree this is a good change.


Getting back to your specific question @rowanc1 - are we putting this in the right place? The biggest issue is that we are clobbering raw arg/option values with parsed arg/option values. I think this is fine; we really should only ever be dealing with the parsed values. The only argument I can think of for keeping the raw values is for round-tripping documents exactly as written. But that's not useful at all - MyST does not care about this (I don't think!), we just don't want to lose any data, and for that, parsed structured data is better. (This is especially true if we are thinking about converting between formats - in that case, any raw value is totally useless.)

I guess the other argument against clobbering the raw values with parsed values is simply backwards compatibility. This is a backwards incompatible change that I don't think will impact anyone, but there could be something somewhere using these raw mystDirective fields. (Maybe we make this a minor version bump?)

Regarding the alternatives you suggest:

  1. I don't think we should do argsParsed... that feels excessive; we are keeping track if node.processed === false, so if that's not the case (i.e. the node is processed), args will be parsed. It's also a little silly with bodyParsed since we are not saving the raw body...
  2. Stashing args/options/body on node.data would be fine - this is just taking the DirectiveData object and saving it as "directive data." I somewhat like that, but at the same time, the data field is usually used for metadata - in the case of mystDirective nodes, args/options/body are the fundamental content, not metadata. 🤷

So maybe it's just fine as-implemented? (We could potentially even stop creating the data object, and pass the node as DirectiveData to the run function... But I think for now this is overthinking it...)


TL;DR: 👍 to your change, maybe minor version bump.

Base automatically changed from mdast/lists to main August 13, 2025 14:01
Copy link
Member

@fwkoch fwkoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - I responded to your question @rowanc1 - but after getting to understand this PR a little more, I think we need slight clarity on the direction:

The tests for these improvements, especially around directive/role handling, are focused on a narrow scoped pipeline that only includes myst-parser and myst-to-md. It ignores all the transforms and additional processing included in the full myst-cli pipeline.

For me to feel comfortable with these changes we need to do one of three things:

  1. Clarify the scope of this PR is isolated to standalone usage of myst-parser and myst-to-md; we should not expect to see the same improvements when using the full myst-cli pipeline.
  2. Modify the myst-cli pipeline for md export to perform fewer transforms (including no lifting of mystDirective nodes, no propagating mystTargets, etc).
  3. Refactor the changes here so they provide the same improvements to post-processed myst-cli mdast as they do to unprocessed myst-parser mdast.

Happy to discuss all this with you more!

if (argSpec.required && data.arg == null) {
validationError = true;
}
node.args = data.arg;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - seeing this, my first impression was "we already stash this stuff on the mystDirective nodes, don't we...?" - That's not entirely true.

On the initial parse pass, we do save args and options to the mystDirective node; these are the raw, unparsed values. I don't think these fields are used anywhere. Additionally, we create mystDirectiveArg/Option/Body nodes, where the raw value is saved to be parsed, in this applyDirectives function. Within this function we are building a data object of type DirectiveData which contains all the parsed arg/option/body content. Critically, this data object is transient; it's just passed directly to the directive plugins.

So, the change in this PR is taking the parsed, previously transient arg/option/body values and saving them directly to the mystDirective node. I agree this is a good change.


Getting back to your specific question @rowanc1 - are we putting this in the right place? The biggest issue is that we are clobbering raw arg/option values with parsed arg/option values. I think this is fine; we really should only ever be dealing with the parsed values. The only argument I can think of for keeping the raw values is for round-tripping documents exactly as written. But that's not useful at all - MyST does not care about this (I don't think!), we just don't want to lose any data, and for that, parsed structured data is better. (This is especially true if we are thinking about converting between formats - in that case, any raw value is totally useless.)

I guess the other argument against clobbering the raw values with parsed values is simply backwards compatibility. This is a backwards incompatible change that I don't think will impact anyone, but there could be something somewhere using these raw mystDirective fields. (Maybe we make this a minor version bump?)

Regarding the alternatives you suggest:

  1. I don't think we should do argsParsed... that feels excessive; we are keeping track if node.processed === false, so if that's not the case (i.e. the node is processed), args will be parsed. It's also a little silly with bodyParsed since we are not saving the raw body...
  2. Stashing args/options/body on node.data would be fine - this is just taking the DirectiveData object and saving it as "directive data." I somewhat like that, but at the same time, the data field is usually used for metadata - in the case of mystDirective nodes, args/options/body are the fundamental content, not metadata. 🤷

So maybe it's just fine as-implemented? (We could potentially even stop creating the data object, and pass the node as DirectiveData to the run function... But I think for now this is overthinking it...)


TL;DR: 👍 to your change, maybe minor version bump.

.join(' ')
: '';
const optOther = Object.entries(otherOptions)
.map(([key, value]) => `${key}=${value}`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing all options this way feels risky. Should we check if value has spaces?

const marker = markerChar.repeat(markerLength);
const directiveInline = [node.name, directiveOpts].filter(Boolean).join(' ');
const args = Array.isArray(node.args)
? state.containerPhrasing({ type: 'heading', depth: 1, children: node.args }, info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this - Are we injecting this heading just to prepend #...? I don't think we always want to do that? But I might just be completely misunderstanding!

````{abc}
````
print("hello world")
value: Ax=b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this value was simplified? I guess maybe backticks no longer matter so much in math since we are using $$ instead of math directive.

But what about the equivalent issue - what if there is a $$ inside the math value...? Are we handling that correctly?

Comment on lines +1008 to +1009
:class: my-class
:name: my-note
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test is quite correct. By the time we have a mystDirective node, I think these options should already be stripped off into a separate field. (Now the test is just treating these as part of the body)

return writeStaticRole('math')(node, _, state);
}
const tracker = state.createTracker(info);
return tracker.move(`$${node.value}$`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have two $ at the beginning?

*/
function mystDirective(node: any, _: Parent, state: NestedState, info: Info): string {
return writeStaticDirective(node.name, { argsKey: 'args' })(node, _, state, info);
function writeDirective(options?: DirectiveOptions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm feeling a little confused about this.

First, I don't think we need this wrapper "factory function" - options is unused. We can just define the function directly.

Second, we are only using writeDirective when we encounter mystDirective nodes. These only persist in the tree when they have an unknown directive type. In that case, in the applyDirectives function, we are deleting the children and writing the options back to the body. It never even gets to the new code where node.args/node.options/node.body are set. For known directives, the mystDirective nodes get lifted out.

(my-label)=
# Heading
:::{admonition #my-label .dropdown} Admonition title
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm extra confused why this works - it's a known directive type but it's using the new writeDirective code, which only applies to mystDirective nodes, not admonition nodes.

Comment on lines +33 to +35
const mdast = mystParse(markdown, { mdast: { hoistSingleImagesOutofParagraphs: false } });
const file = unified()
.use(mystToMd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh... now I understand. This roundtrip test is checking the myst-parse -> myst-to-md roundtrip. It is not using any of the myst-cli pipeline. Therefore, it's skipping all the basic transforms which are just applied by default in myst-cli.

If we were to try to round-trip via myst-cli, I don't think we'd see the same results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants